Framework
Modelβ
Trainerβ
class eole.trainer.Trainer(model, train_loss, valid_loss, scoring_preparator, valid_scorers, optim, trunc_size=0, norm_method='sents', accum_count=[1], accum_steps=[0], n_gpu=1, gpu_rank=1, parallel_mode='data_parallel', report_manager=None, with_align=False, model_saver=None, average_decay=0, average_every=1, earlystopper=None, dropout=[0.3], attention_dropout=[0.1], dropout_steps=[0], zero_out_prompt_loss=False, estim_loss_lambda=[1.0], estim_loss_lambda_steps=[0])[source]β
Bases: object
Class that controls the training process.
- Parameters:
- model (
eole.models.model.BaseModel
) β model to train - train_loss (
eole.utils.loss.LossComputeBase
) β training loss computation - valid_loss (
eole.utils.loss.LossComputeBase
) β training loss computation - scoring_preparator (
eole.predict.utils.ScoringPreparator
) β preparator for the calculation of metrics via the _eval_handler method - valid_scorers (dict) β keeps in memory the current values of the validation metrics
- optim (
eole.utils.optimizers.Optimizer
) β the optimizer responsible for update - trunc_size (int) β length of truncated back propagation through time
- accum_count (list) β accumulate gradients this many times.
- accum_steps (list) β steps for accum gradients changes.
- n_gpu (int) β number of gpu.
- gpu_rank (int) β ordinal rank of the gpu in the list.
- report_manager (
eole.utils.ReportMgrBase
) β the object that creates reports, or None - with_align (bool) β whether to jointly lear alignment (Transformer)
- model_saver (
eole.models.ModelSaverBase
) β the saver is used to save a checkpoint. Thus nothing will be saved if this parameter is None. - average_decay (float) β cf opt.average_decay
- average_every (int) β average model every x steps.
- earlystopper (
eole.utils.EarlyStopping
) β add early stopping mecanism - dropout (float) β dropout value in RNN or FF layers.
- attention_dropout (float) β dropaout in attention layers.
- dropout_steps (list) β dropout values scheduling in steps.
- zero_out_prompt_loss (bool) β whether to zero-out the prompt loss (mostly for LLM finetuning).
- estim_loss_lambda (List *[*float ]) β weight applied to estimator loss
- estim_loss_lambda_steps (List *[*int ]) β steps to apply to estimator values
- model (
train(train_iter, train_steps, save_checkpoint_steps=5000, valid_iter=None, valid_steps=10000)[source]β
The main training loop by iterating over train_iter
and possibly
running validation on valid_iter
.
- Parameters:
- train_iter β An iterator that returns the next training batch.
- train_steps β Run training for this many iterations.
- save_checkpoint_steps β Save a checkpoint every this many iterations.
- valid_iter β A generator that returns the next validation batch.
- valid_steps β Run evaluation every this many iterations.
- Returns: training loss statistics
- Return type:
:obj:
nmt.Statistics
validate(valid_iter, moving_average=None)[source]β
Validate model.
- Parameters: valid_iter β validate data iterator
- Returns: validation loss statistics
- Return type:
:obj:
nmt.Statistics
class eole.utils.Statistics(loss=0, auxloss=0, n_batchs=0, n_sents=0, n_words=0, n_correct=0, computed_metrics={})[source]β
Bases: object
Accumulator for loss statistics. Currently calculates:
- accuracy
- perplexity
- elapsed time
accuracy()[source]β
compute accuracy
static all_gather_stats(stat, max_size=4096)[source]β
Gather a Statistics object accross multiple process/nodes
- Parameters:
- stat**(** β obj:Statistics): the statistics object to gather accross all processes/nodes
- max_size (int) β max buffer size to use
- Returns: Statistics, the update stats object
static all_gather_stats_list(stat_list, max_size=4096)[source]β
Gather a Statistics list accross all processes/nodes
- Parameters:
- stat_list (list([Statistics])) β list of statistics objects to gather accross all processes/nodes
- max_size (int) β max buffer size to use
- Returns: list of updated stats
- Return type: our_stats(list([Statistics]))
computed_metric(metric)[source]β
check if metric(TER/BLEU) is computed and return it
elapsed_time()[source]β
compute elapsed time
log_tensorboard(prefix, writer, learning_rate, patience, step)[source]β
display statistics to tensorboard
output(step, num_steps, learning_rate, start)[source]β
Write out statistics to stdout.
- Parameters:
- step (int) β current step
- n_batch (int) β total batches
- start (int) β start time of step.
ppl()[source]β
compute perplexity
update(stat, update_n_src_words=False)[source]β
Update statistics by suming values with another Statistics object
- Parameters:
- stat β another statistic object
- update_n_src_words (bool) β whether to update (sum) n_src_words or not
xent()[source]β
compute cross entropy
Lossβ
class eole.utils.loss.LossCompute(criterion, generator, lambda_coverage=0.0, lambda_align=0.0, tgt_shift_index=1, vocabs=None, lm_generator=None, lm_prior_lambda=None, lm_prior_tau=None, lm_prior_model=None)[source]β
Bases: Module
Class for managing efficient loss computation. Handles accumulating multiple loss computations.
- Parameters:
- criterion (
nn. loss function
) β NLLoss or customed loss - generator (
nn.Module
) - lambda_coverage β Hyper-param to apply coverage attention if any
- lambda_align β Hyper-param for alignment loss
- tgt_shift_index (int) β 1 for NMT, 0 for LM
- vocabs β full vocabs with specials module that maps the output of the decoder to a distribution over the target vocabulary.
- lm_generator (
ctranslate2.Generator
) β LM Generator - lm_prior_lambda (float) β weight of LM model in loss
- lm_prior_tau (float) β scaler for LM loss
- criterion (
forward(batch, output, attns, trunc_start=0, trunc_size=None, estim=None)[source]β
Compute the forward loss, supports truncated BPTT for long sequences by taking a range in the decoder output sequence to back propagate in. Range is from (trunc_start, trunc_start + trunc_size). Truncation is an approximate efficiency trick to relieve the memory required in the RNN buffers.
- Parameters:
- batch (batch) β batch of labeled examples
- output (
FloatTensor
) β output of decoder model(batch, tgt_len, hidden)
- attns (dict) β dictionary of attention weights
(batch, tgt_len, src_len)
- trunc_start (int) β starting position of truncation window
- trunc_size (int) β length of truncation window
- Returns:
A tuple with the loss and a
eole.utils.Statistics
instance.
classmethod from_config(config, model, vocabs, train=True)[source]β
Returns a subclass which wraps around an nn.Module subclass (such as nn.NLLLoss) which defines the loss criterion. The LossCompute object passes relevant data to a Statistics object which handles training/validation logging. The Criterion and LossCompute options are triggered by opt settings.
ignore_prompt(batch)[source]β
Mask the prompt in the target side of the batch examples in order : to set the loss of the prompt to zero.
For finetuning on specific tasks. The end of the prompt must be indicated by the DefaultTokens.MASK_BEFORE
placeholder.
The masks are supposed to be properly handled by the loss criterion : (e.g. nn.CrossEntropyLoss ).
- Parameters: batch β The current batch.
Optimizerβ
class eole.utils.Optimizer(optimizer, learning_rate, learning_rate_decay_fn=None, max_grad_norm=None)[source]β
Bases: object
Controller class for optimization. Mostly a thin wrapper for optim, but also useful for implementing rate scheduling beyond what is currently available. Also implements necessary methods for training RNNs such as grad manipulations.
- Parameters:
- optimizer β A
torch.optim.Optimizer
instance. - learning_rate β The initial learning rate.
- learning_rate_decay_fn β An optional callable taking the current step as argument and return a learning rate scaling factor.
- max_grad_norm β Clip gradients to this global norm.
- optimizer β A
property amp[source]β
True if use torch amp mix precision training.
backward(loss)[source]β
Wrapper for backward pass. Some optimizer requires ownership of the backward pass.
classmethod from_config(model, config, checkpoint=None)[source]β
Builds the optimizer from options.
- Parameters:
- cls β The
Optimizer
class to instantiate. - model β The model to optimize.
- opt β The dict of user options.
- checkpoint β An optional checkpoint to load states from.
- cls β The
- Returns:
An
Optimizer
instance.
learning_rate(step=None)[source]β
Returns the current learning rate.
step()[source]β
Update the model parameters based on current gradients.
Optionally, will employ gradient modification or update learning rate.
property training_step[source]β
The current training step.
zero_grad(set_to_none=True)[source]β
Zero the gradients of optimized parameters.
class eole.utils.AdaFactor(params, lr=None, beta1=0.9, beta2=0.999, eps1=1e-30, eps2=0.001, cliping_threshold=1, non_constant_decay=True, enable_factorization=True, ams_grad=True, weight_decay=0)[source]β
Bases: Optimizer
step(closure=None)[source]β
Perform a single optimization step to update parameter.
- Parameters: closure (Callable) β A closure that reevaluates the model and returns the loss. Optional for most optimizers.
NOTEβ
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.
class eole.utils.FusedAdam(params, lr=0.001, bias_correction=True, betas=(0.9, 0.999), eps=1e-08, eps_inside_sqrt=False, weight_decay=0.0, max_grad_norm=0.0, amsgrad=False)[source]β
Bases: Optimizer
Implements Adam algorithm. Currently GPU-only.
: Requires Apex to be installed via
python setup.py install --cuda_ext --cpp_ext
.
- Parameters:
- params (iterable) β iterable of parameters to optimize or dicts defining parameter groups.
- lr (float , optional) β learning rate. (default: 1e-3)
- betas (Tuple *[*float , float ] , optional) β coefficients used for computing running averages of gradient and its square. (default: (0.9, 0.999))
- eps (float , optional) β term added to the denominator to improve numerical stability. (default: 1e-8)
- weight_decay (float , optional) β weight decay (L2 penalty) (default: 0)
- amsgrad (boolean , optional) β whether to use the AMSGrad variant of this algorithm from the paper βOn the Convergence of Adam and Beyondβ (default: False) NOT SUPPORTED in FusedAdam!
- eps_inside_sqrt (boolean , optional) β in the βupdate parametersβ step, adds eps to the bias-corrected second moment estimate before evaluating square root instead of adding it to the square root of second moment estimate as in the original paper. (default: False)
step(closure=None, grads=None, output_params=None, scale=1.0, grad_norms=None)[source]β
Performs a single optimization step.
- Parameters:
- closure (callable , optional) β A closure that reevaluates the model and returns the loss.
- grads (list of tensors , optional) β weight gradient to use for the optimizer update. If gradients have type torch.half, parameters are expected to be in type torch.float. (default: None)
- params (output) β A reduced precision copy of the updated weights written out in addition to the regular updated weights. Have to be of same type as gradients. (default: None)
- scale (float , optional) β factor to divide gradient tensor values by before applying to weights. (default: 1)